\chapter{Establishing The Ethnic human mutation database}

\section{System Design and Implementation}
The design of ThaiMUT is based on a three-tier architecture model (client, application server and database). Figure \ref{figure:thaimut_structure} depicts the overall architectural design of ThaiMUT. On the client layer, we use PHP-based scripts as a CGI on the Apache web server to render the graphical web-interface. Mutations, SNPs and other human genome reference information are tabulated in a MySQL relational database. To simplify the SQL complexity, we created a web interface using CGI scripts in the application server to compose the queries on behalf of the users. Users will only need to enter a keyword such as gene name, disease name then the web server will transform the requests to query language (SQL) to get the pertinent data from MySQL.
\begin{figure}[ht]
\begin{center}
\includegraphics[height=5in]{figure/thaimut_structure.png}
\end{center}
\caption{Implementation of ThaiMUT database system: 1) Users search for mutation/variation via web interface implemented using PHP scripts 2) Apache web server receives search requests and converts them to SQL commands issued to MySQL database 3) MySQL returns queried results back to Apache server 4) PHP scripts process raw outputs and transform them to html or SVG results that web browsers can render.} 
\label{figure:thaimut_structure}
\end{figure}

	To minimize data redundancy in storage, ThaiMUT application using the entity relationship(ER) model in designing of database. The ER model can be separated into 3 parts (figure \ref{figure:thaimut_er}); gene data part contained all genotype data including associated disease data in both coding and non-coding region, mutation/variation data part contain the detail of mutation/variation itself and referenced paper and user data part contained each use’s data and their submission data. 
\begin{figure}[ht]
\begin{center}
\includegraphics[height=5in]{figure/thaimut_er.png}
\end{center}
\caption{The ER-diagram design of ThaiMUT application 1) Gene data 2) Mutation/Variation data 3) user data}
\label{figure:thaimut_er}
\end{figure}

The most important thing to implement the databases is the data itself. The data were gathered using Microsoft Endnote software. Over 1,000 paper of mutation and variation has been verified for the mutation of Thai population. After the process of verifying and filtering, the mutations were listed in spreadsheet file. Then, the the spreadsheet file format were parsed and categorized using Python script with regular expression programming. Each type of mutation has its own regular expression pattern. Right when the mutation was categorized, the script will insert categorized data into MySQL DBMS.

Based on MVC framework, ThaiMUT web application were implemented using CodeIgnitor, the open source php web framework, which were designed for developer to implement web application in MVC model. For the Model layer, the MySQL DBMS were used to store all data required, e.g., gene data, mutation data, variation data, submission data and user’s data. The EXT JS Javascript framework was integrated to CodeIgnitor as the View layer to interact with users, then use PHP programming (implemented in CodeIgnitor framework) as the Controller layer to share business logic between View layer and Model layer. The structure of ThaiMUT application can be viewed in figure \ref{figure:thaimut_mvc}

\begin{figure}[ht]
\begin{center}
\includegraphics[height=2.5in]{figure/thaimut_mvc.png}
\end{center}
\caption{The application layout of ThaiMUT application based on MVC web application framework}
\label{figure:thaimut_mvc}
\end{figure}

The content and structure of ThaiMUT follows guidelines given by http://www.hgvs.org and by Scrivers recommendation (Scriver, et al., 1999). ThaiMUT was constructed in such a way that mutations/SNPs can be incorporated into and queried from the database. We began by incorporating large number of Thai mutation reports excerpted from literatures in PubMed to our MySQL database. Some of the mutations not in PubMed were collected from local publications and personal communication with researchers.


 In addition to mutations, validated SNPs in Thai population were also cataloged. These SNPs were obtained from (Mahasirimongkol, et al., 2006) which compared amongst Thais and Northeast Asian populations (Chinese and Japanese) their allele frequencies and linkage disequilibrium (LD) patterns from 188 drug related genes. This data set was obtained by genotyping SNPs 280 individuals from 4 major geographical regions in Thailand. Allele frequencies from these drug-related SNPs were systematically made available for the first time through the ThaiMUT database. Flanking sequences of these SNPs were compared against the latest build of reference sequence (RefSeq) obtained from NCBI database (Pruitt, et al., 2007). They can be visualized along with SNPs from other populations in various public domain databases namely dbSNP (Smigielski, et al., 2000), JSNP (Hirakawa, et al., 2002) and HapMap (Thorisson, et al., 2005).

Related genomics and genetic information are supplied along with the mutations and SNPs, for example: locus IDs, gene names, OMIM associated Mendelian diseases, nomenclatures, allele frequencies and their publication references with link-outs to PubMed. In order to assist scientists to compare SNPs across different populations, ThaiMUT integrated the latest public domain SNP databases. Therefore, SNP maps from each database can be presented all at once along with Thai SNPs. For graphical viewing, a W3 standard Scalable Vector Graphics (SVG; http://www.adobe.com/svg) was adopted; users can visualize the location and characteristic of selected SNPs from a comparative view of different populations. An apache web-server was set up to interact with the underlying database and to provide web-based graphical output to users. Users can query for both mutations and SNPs via regular web search box or intuitively select a locus from ideogram view to get the required information.

To regularly maintain data accuracy and to alert a curator to update the database, ThaiMUT features a direct submission of unpublished data from researchers. However, that data will be marked as unpublished and will be unmarked later when publication is officially made. The submission form complies with the guideline given by the Human Genome Variation Society (HGVS; http://www.hgvs.org). Most novel mutations and SNPs are expected to be submitted by members of the Thai Genetic Society who share common interests in Mendelian diseases and genetic epidemiological studies.

To transfer data between user querying, ThaiMUT application use Ajax technology to transfer user’s request to application backend. Doing this, user need not to refresh every single search they query. Each data sent to Controller layer is in the XMLHttpRequest protocol while the data queried from backend is sent back via XMLHttpResponse protocol in XML file format. The Javascript framework, then, processes the XML file to represent in user’s browser. Like other http protocol request and response, this process bind user from knowing which is the data exactly sending and receiving. ThaiMUT database is publicly available from http://gi.biotec.or.th/thaimut. Most popular web browsers, e.g., Internet Explorer (IE6 on Windows XP and IE7 on Vista) and Firefox, should be able to view mutation/variation contents in ThaiMUT. To access ThaiMUT, the JavaScript feature must be enabled. By default, web browsers enable JavaScript but disable other security-prone features such as ActiveX and pop-ups blocking. Since ThaiMUT relies only on JavaScript, most users should be able to access the web site. To graphically visualize SNPs on genes, Internet Explorer users are required to install SVG (Scalable Vector Graphic) viewer plug-in made available by Adobe software (http://www.adobe.com/svg).

\section{Database Access}
ThaiMUT database is publicly available from http://gi.biotec.or.th/thaimut . Most popular web browsers, e.g., Internet Explorer (IE6 on Windows XP and IE7 on Vista) and Firefox, should be able to view mutation/variation contents in ThaiMUT. To access ThaiMUT, the JavaScript feature must be enabled. By default, web browsers enable JavaScript but disable other security-prone features such as ActiveX and pop-ups blocking. Since ThaiMUT relies only on JavaScript, most users should be able to access the web site. To graphically visualize SNPs on genes, Internet Explorer users are required to install SVG (Scalable Vector Graphic) viewer plug-in made available by Adobe software (http://www.adobe.com/svg).

\section{Querying The Database}
For convenience and efficiency, both mutation and variation information in ThaiMUT must be queried through intuitive web interface similar to other genomic database interfaces. We have made the interface using Ext JS2.0 (extjs.com) which is a JavaScript library used in construction of many web applications. Four main feature search schemes (described below) are presented as tabs in the control frame of ThaiMUT. The database summary such as number, types of mutations is displayed in ThaiMUT welcome page. Figure \ref{figure:thaimut_ss1} illustrates the ThaiMUT interface.

\begin{figure}[ht]
\begin{center}
\includegraphics[height=4in]{figure/thaimut_ss1.png}
\end{center}
\caption{ThaiMUT web interface to the database: 1) Two tabs for basic search; one for mutation and the other for variation 2) Mutation advanced search 3) Search mutation by gene name ordered in alphabetical 4) Mutation search by chromosome, 5) Submission system and 6) Online help and contact.} 
\label{figure:thaimut_ss1}
\end{figure}

\begin{enumerate}
\item {\it Basic Search}: This is the most recommended feature search to explore mutations and variations in ThaiMUT. The search box can accept any string of gene name or locus ID, disease name, chromosome number, OMIM number, nomenclature, title and author of reference article. The basic search is separately provided for mutation as well as variation. 
\item {\it Advanced Search}: Similar to basic search, this feature accepts strings of all types. It also offers the
search by mutation types and a range of years that an article was published. These queries can also be
combined, e.g., user can search by gene name and/or author of the article at the same time. 
\item {\it Alphabetical Search}: This feature arranges gene symbols or names in alphabetical order. Clicking on each alphabet, i.e., the initial of the gene name, will narrow the search space. All available genes with
mutations found will be displayed. 
\item {\it Chromosomal Search}: This feature offers a list of 22 autosomes plus the X and Y chromosomes. Users
can directly jump to a chromosome of interest and visualize where mutations/variations are on the chromosome. Users can click on one of the colored boxes indicating where mutations/variations occurred on the chromosome.


\end{enumerate}

\section{Data Submission}

ThaiMUT encourages the human genetic research community both in Thailand and other countries to submit their discovered polymorphisms to the database. Users can click on the submission tab (see Fig. \ref{figure:thaimut_ss1}) to access the submission system. To submit either mutation(s) or variation(s), users are required to register their emails and contact information first. ThaiMUT will issue a password as in an alert box which can be copied and used to login to use the forms. The forms are provided separately for mutation and variation. In the design of ThaiMUT database, other types of mutations or variations such as STRs, microsatellites, minisatellites can be submitted and stored to the database. The submission form strictly follows proposed mutation entry and quality control form  (http://www.hgvs.org/entry.html), which is a recommendation by the MDI/HGVS. For security reasons, all submitted data would not be disclosed for public viewing unless they have been published or verified by a group of committees (e.g., appointed by the Thai Human Genetic Society), who are responsible for quality control of submission entries.





